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Fingerprinting with Minimum Distance 

Decoding 

Abstract 

This work adopts an information theoretic framework for the design of collusion-resistant cod- 
ing/decoding schemes for digital fingerprinting. More specifically, the minimum distance decision rule 
is used to identify 1 out of t pirates. Achievable rates, under this detection rule, are characterized 
in two distinct scenarios. First, we consider the averaging attack where a random coding argument 
is used to show that the rate 1/2 is achievable with t = 2 pirates. Our study is then extended to 
the general case of arbitrary t highlighting the underlying complexity-performance tradeoff. Overall, 
these results establish the significant performance gains offered by minimum distance decoding as 
compared to other approaches based on orthogonal codes and correlation detectors which can support 
only a sub-exponential number of users (i.e., a zero rate). In the second scenario, we characterize the 
achievable rates, with minimum distance decoding, under any collusion attack that satisfies the marking 
assumption. For t = 2 pirates, we show that the rate 1 — _ff(0.25) w 0.188 is achievable using an 
ensemble of random linear codes. For t > 3, the existence of a non-resolvable collusion attack, with 
minimum distance decoding, for any non-zero rate is established. Inspired by our theoretical analysis, we 
then construct coding/decoding schemes for fingerprinting based on the celebrated Belief-Propagation 
framework. Using an explicit repeat-accumulate code, we obtain a vanishingly small probability of 
misidentification at rate 1/3 under averaging attack with t = 2. For collusion attacks which satisfy 
the marking assumption, we use a more sophisticated accumulate repeat accumulate code to obtain a 
vanishingly small misidentification probability at rate 1/9 with t = 2. These results represent a marked 
improvement over the best available designs in the literature. 

EPICS WAT-FING 
I. INTRODUCTION 

Digital fingerprinting is a paradigm for protecting copyrighted data against illegal distribution 
[1]. In a nutshell, a distributor, i.e., the provider of copyrighted data, wishes to distribute its data 
D among a number of licensed users. Each licensed copy is identified with a mark, which will 
be referred to as a fingerprint in the sequel, composed of a set of redundant digits embedded 
inside the copyrighted data. The locations of the redundant digits are kept hidden from the users 
and are only known to the distributor. Their positions, however, remain the same for all users. 
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If any user re-distributes its data in an unauthorized manner, it will be easily identified by its 
fingerprint. However, several users may collude to form a coalition enabling them to produce an 
unauthorized copy which is difficult to trace. In the literature, the colluding members are typically 
referred to as pirates or colluders. Hence, the need arises for the design of collusion-resistant 
digital fingerprinting techniques. Our work develops an information theoretic framework for the 
design of low complexity pirate-identification schemes. 

To enable a succinct development of our results, we first consider the widely studied averaging 
attack [2]. The colluders, in this strategy, average their media contents to produce the forged 
copy. An explicit fingerprinting code construction for this attack was proposed in [2]. In this 
construction, however, the maximum number of users M, grows only polynomially with the fin- 
gerprinting code-length n (more precisely M = 0(n 2 )). Clearly, this rate of growth corresponds 
to a zero rate in the information theoretic sense. This motivates our pursuit for a fingerprinting 
scheme which supports an exponentially growing number of users, with the code-length, while 
allowing for low complexity pirate-identification strategies. Towards this goal, we use a random 
coding argument to establish the existence of a rate 0.5 linear fingerprinting code which achieves 
a vanishingly small probability of misidentification when 1) Only t — 2 pirates are involved in 
the averaging attack and 2) The low complexity minimum distance (MD) decoder is used to 
identify one of the two pirates. The enabling observation is the intimate connection between the 
scenario under consideration and the binary erasure channel (BEC). This result is then extended 
to the general case with an arbitrary coalition size t where the tradeoff between complexity and 
performance is highlighted. 

Building on our analysis for the averaging attack, we then proceed to fingerprinting strategies 
which are resistant to more general forging techniques. More specifically, we adopt the marking 
assumption first proposed in [1]. In this framework, the pirates attempt to identify the positions 
occupied by the fingerprinting digits by comparing their copies. Afterwards, they can only modify 
the identified coordinates, in any desired way, to minimize the probability of traceability. The 
validity of the marking assumption hinges on the assumption that any modification to the data 
content D will damage it permanently. This prevents the users from modifying any location in 
which they do not identify as a fingerprinting digit since it may be a data symbol. Boneh and 
Shaw [1] were the first to construct fingerprinting codes that are resistant to attacks that satisfy 
the marking assumption. This approach was later extended in [3] using the idea of separating 
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codes [4]. To the best of our knowledge, the best available explicit binary fingerprinting codes 
are the low rate codes presented in [3]. For example, for t = 2, the best available code has a 
rate^ 0.0092. More recently, upper and lower bounds on the binary fingerprinting capacity for 
t — 2 and t = 3 were derived in [5]. The decoder used in [5], however, was based on exhaustive 
search, and hence, would suffer from an exponentially growing complexity in the code length. 
This prohibitive complexity motivates our proposed approach. In this paper, we show that using 
linear fingerprinting codes and MD decoding, one can achieve rates less than 0.188 when the 
coalition size is t = 2. Unfortunately, the proposed approach does not scale for t > 3. This 
negative result calls for a more sophisticated identification technique inspired by the analogy 
between our set-up and multiple access channels. Our results in this regard will be reported 
elsewhere. 

Since the complexity of the exact MD decoder can be prohibitive when the code-length 
is long, we develop a low complexity belief-propagation (BP) identification approach [6] [7]. 
This detector only requires a linear complexity in n, and offer remarkable performance gain 
over the best known explicit constructions for fingerprinting [3] [2]. For example, we propose 
a modified iterative decoder tailored for the averaging attack with t = 2. Using this decoder 
along with an explicit repeat-accumulate (RA) fingerprinting code, we achieve a vani shingly 
small probability of misidentification for rates up to 1/3. For the marking assumption set-up, we 
achieve a vanishingly small misidentification probability for rates up to 1/9 using the recently 
proposed class of low rate accumulate repeat accumulate (ARA) codes [8]. It is worth noting 
that these results represent a marked improvement over the state of the art in the literature. 
Furthermore, one would expect additional performance enhancement by optimizing the degree 
sequences of the codes (which is beyond the scope of this work). 

The rest of the paper is organized as follows. In Section HO we introduce the mathematical 
notations and formally define our problem setup. Then we explore the theoretical limits of 
fingerprinting using the MD decoder in Sections [III] and [IV] The simulation results based on the 
BP framework are presented in Section [V] Finally, Section IVTl offers some concluding remarks. 

II. Notations and Problem statement 

Throughout the paper, random variables and their realizations are denoted by capital letters and 
corresponding smaller case letters, respectively. Deterministic vectors are denoted by bold-face 
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letters. We denote the entropy function by H(-), with the argument being the probability mass 
function. Furthermore, for simplicity, we abbreviate H(p, 1 — p) by H(p), where 1 > p > 0. 
For two functions of n, we write a(n) = b(n) if: lim ^fj^y = 1, for example, (™) = 2 nH( -n\ 
The Hamming distance between two vectors x 1; x 2 is denoted by d ff (x!,x 2 ). Without loss of 
generality, we assume that the number of users is M, and hence, a coalition U of size t is a subset 
of {1, 2, ... , M} where \U\ = t. The goal of the coalition, in a nutshell, is to produce a forged 
fingerprint, y, such that the distributor will not be able to trace it back to any of its members. In 
the following, we first introduce the notation that will be used for a general attack satisfying the 
marking assumption and then specify our notations for the averaging attack scenario. It should be 
noted that our formulation follows in the footsteps of [5]. For completeness, however, we repeat 
it here. As mentioned in [1], deterministic fingerprinting under the marking assumption is not 
possible in general. Therefore, the distributor needs to employ some kind of randomization which 
leads to a collection of binary codes (F, G) composed of K pairs of encoding and decoding 
functions as: 

/ fc :{l,2,...,M}-{0,l} n (1) 

<7 fc :{0,l}"-{l,2,...,M} 

k = l,2,...,K, 

where the code rate R is log ^ M and the secret key, A; is a random variable employed to randomize 
the codebook. This way, the exact codebook utilized for fingerprinting is kept hidden from the 
users. It should be noted that, adhering to common conventions in cryptography, the family of 
encoding and decoding functions as well as the probability distribution of the secret key, p(k), 
are known to all users. Finally, it is clear from the definition of g k that the objective of the 
distributer, in our formulation, is to identify only one of the colluders correctly. 

For simplicity of presentation, let's assume that t — 2 then the fingerprints corresponding to 
the coalition of users (also referred to as pirates or colluders), ui,U2 are denoted by {xi,x 2 }. 
The marking assumption implies that position % is undetectable to the two colluders if xu = x 2 i, 
otherwise it is called detectable [1]. Those undetectable coordinates can not be changed by the 
pirates, and hence, the set of all possible forged copies is give by 



E(U) = {y £ {0, l} n | Hi = xu,Vi undetectable}. 



(2) 
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In general, a coalition U may utilize a random strategy that satisfies the marking assumption 
to produce y. That is, if V(y | xi,x 2 ) is the probability that y is created, given the coalition 
{xi,x 2 }, then we have: 

V(y| Xl ,x 2 ) = for all y £ E(U). (3) 

In this paper, we focus on the maximum probability of misidentification over the set of all 
strategies which satisfy © (denoted by V in the sequel). Similar to [5], we average the probability 
of misidentification over all possible coalitions leading to the following performance metric: 

P m (F, G):=-±j-Yl ma £ P ™( U > F ' G ' V ^ < 4 > 
\t)u e 

where 

p ro (£/,F,G,y):=Ejf( v (y\fk(u))). 

yeE{U),g k {y)£U 

In the case of an averaging attack, we employ the typical assumption of mapping the binary 
fingerprints into the antipodal alphabets { — 1,1} where the encoder now is defined as [2] 

/:{l,2,...,M}-{-l,+l} n . (5) 

As anticipated from the name, the forged copy is now given by: 

1 * 

1 i=i 

where the addition is over real field. The decoder is now defined as 

? :K}^{1,2,..,M}, (7) 

where J3 y is the alphabets of y, for example, it is { — 1,0, +1} when t = 2. Misidentification 
will happen if g(y) ^ U. Note that for t — 2, if g(y) e U, i.e., we trace one colluder correctly 
then we can always trace another colluder correctly according to ©. In this special case, the 
performance metric in © reduces to 

^ : =7^r5>(y)^)- (8) 

\ t ) u 
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III. The Averaging Attack 

In this section, we investigate the theoretical achievable rate of fingerprinting code with 
the minimum distance (MD) decoder under the averaging attack. First, we need the following 
definition. 

Definition 1: We say that the capacity of of an ensemble of fingerprinting codebooks £ is R? 
under MD decoding if 

1) For M = 2 nR with R < Rg, the average probability of misidentification over the ensemble 
P m using MD decoding goes exponentially to zero as the codelength n goes to infinity. 

2) Conversely, for M = 2 nR with R > Rg, there exists a constant 5 > such that P m > 5 
for sufficiently large block lengths. 

Note that this converse in the previous definition is applicable only to a specific family of codes 
similar to the approach taken in [6], [7]. We also call a rate is MD-achievable if only the first 
part in Definition \T\ is met. We are now ready to prove our first result. 

Theorem 1: The fingerprinting capacity of the i.i.d codebook ensemble when t = 2 is Rg = 
0.5 (under the averaging attack and the MD decoder). 

Proof: The encoder and decoder come as follows. 
Encoder: The encoder chooses codewords uniformly and independently from all 2 n different 
vectors belonging to {0, l} n , transfers the fingerprinting codeword alphabets from {0, 1} to 
{ — 1, +1}, and assigns the fingerprints to the users. 

Decoder: With the given forged fingerprint y, the decoder treats the position i where y, = as 
an erased position, and the others as unerased positions. Let £ be the set of erasure positions 
and £ := [1 : n] \ £. Also let y^- denote those components of y which are indexed by £. The 
decoder will search the codebook to find the codeword which agrees with y in all unerased 
positions y-. Once the decoder finds such a codeword, the decoder declares it as the pirate. A 
misidentification occurs when the codeword of an innocent user z is consistent with y. 
Achievability: For a small e, we say the assigned fingerprints x 1; x 2 are close if g^(x 1 ,x 2 ) < 
here the fingerprinting alphabets are {0, 1} before transformation. As shown in Appendix 
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II- Al we know that with high probability, (x 1; x 2 ) are a close pair. Thus, given a small e > 0, 

|£|<n(^ + e), (9) 

since the erasures happen when the bits of (x 1; x 2 ) are different. For the given forged fingerprint 
y, z must agree with y in all n — |e | unerased positions, and can be —1 or +1 in the rest |e| 
erased positions. The probability of choosing such codeword is upper-bounded by 

2 n*(l/2+e) (10) 

By using the union bound, we know that for R < 1/2 — e, the probability of misidentification 
P m tends to zero exponentially fast for sufficiently large codeword length n. 

Converse: From (1231) in the Appendix, we know that P(|e| > n/2) > P(|e| = n/2) = 5, 
where 5 is non-vanishing with respect to codeword length n. For a fingerprinting codeword x, 
we form x- as the components of x which are indexed by E. And we arrange all x- in the 
fingerprinting codebook as rows of a 2 nR x (n — |e|) array X^. The misidentification happens 
if y^- equals to more than two rows of X^. With P> 1/2, |e| > n/2, and sufficiently large n, 

2 nR _ 2 > 2 (n-NI) _ 1. (11) 

And the misidentification will happen with probability at least 1/3. From above, we know that if 
R > 1/2, the misidentification probability will be larger than 5/3 for sufficiently large n which 
concludes the proof. 

□ 

Intuitively, the i.i.d generated codebook will result in |e| ps n/2 number of erased positions 
with high probability [5]. Then the "channel" between one of the pirates xi and the forged 
fingerprint y can be approximated by a binary erasure channel (BEC) with erasure probability 
1/2. From [9], we know that the capacity using the MD decoder of this channel is 1/2. However, in 
the two-pirate fingerprinting system, there are always two codewords xi and x 2 in the codebook 
which meet the MD decoding criteria. This is the fundamental difference between this system 
and the classical BEC channel. In the BEC channel, with high probability, only one codeword 
will meet the MD decoding criteria. As will be presented in Section IV-Al this difference will 
have an important implication on the design of Belief Propagation decoders for fingerprinting. 
The following result shows that restricting ourselves to the class of linear fingerprinting does 
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not entail any performance loss (at least from an information theoretic perspective) 

Theorem 2: The fingerprinting capacity of the binary linear ensemble with t = 2 is Rs = 0.5 
(under the averaging attack and the MD decoder). 

Proof: We consider the ensemble of binary linear codes of length n and dimension n — I 
defined by the / x n parity check matrix H, where each entry of H is an i.i.d Bernoulli random 
variable with parameter 1/2. The code rate R = 1 — l/n. 

Encoder: The encoder chooses one codebook from this linear code ensemble, transfers the 
fingerprinting codeword alphabets from {0, 1} to {— 1,+1}, and assigns the fingerprints to the 
users. 

Decoder: With the given forged fingerprint y, again the decoder treats the position i where 
yi = as an erased position, and the others as unerased positions. The decoder will also transfer 
the alphabets of unerased positions from {—1, +1} back to {0, 1}. Let H T denote the submatrix 
of H that consists of those columns of H which are indexed by the set of erasures e . In a similar 
manner, let x E denote those components of the pirate's fingerprint which are indexed by £, and 
x$ denote those components which are indexed by e. In the following, we assume that the 
fingerprinting codeword alphabets are transferred back to {0, 1} and the addition is module-2. 
Note that the true pirates xi and x 2 will result in the same x- = y-, where y- is defined as in 
Theorem [IJ From the parity check equations, 

H, E *l = s T , (12) 

where s T := H-gy^ is called the syndrome. The syndrome is known at the decoder. The decoder 
solves these linear equations to find x E , combines it with the known x- = y- and declares one 
of the results as the pirate. 

Achiev ability: We know that (fT2j) has at least two solutions corresponding to the true pirates xi 
and x 2 . The rank of I x |e | matrix H % must equal to \<e \ — 1 to make sure that there is only 
two solutions. The decoder will declare an innocent user as the pirate if there are more than two 
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solutions, iff H % has rank less than |e| — 1. This happens with probability 
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where M b (l 1 ,m 1 , k%) denote the number of binary matrices with dimension i x x mj and rank fci. 

To make (fT3l) approach zero as n increases, the second term in (fT3l) must approach one as n 
goes up. To show this, we first assume that |e | + n.ei < /, where e x > is a small number. And 
according to (1281) in Appendix |n] and [10], the second term in (fT3l) equals 



M 6 (|e| 
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From [10], for j = . . . |e | — 1 

Nl-i-i 

M b (|£|-j,/,|£|-j)= n ( 2 '-2 P )- (15) 

p=0 

Using this formula in (fT4"l) and dividing the nominator and denominator by M b (\T, | — 1,1, |e| — 1), 
this term equals 

2l*l _ l 

2CI-I—D + 2'[-i + nt=o 2 1/(1 - 2p "0] ' (1 } 

Note that ne\ < I — |e|, each 2 p ~ l approaches zero exponentially fast with n. By using Taylor 
series on 1/(1 — 2 p ~ l ), and with some simplifications, the denominator becomes 

1*1-2 

2(1*1-1) + ^ 2 P + 2 |£| * /i.o.l = 2 |£| * (1 + h.o.t.) - 1, (17) 

p=0 

where the higher order terms of the Taylor series are denoted by h.o.t and approach zero 
exponentially fast. Using this result in (fT6l) . our claim is valid and (fT3l) approaches zero as 
n — > oo if |e I + ne% < I. 

As shown in Appendix II-BL |e| < n(l/2 + e) with high probability, we know that if n(l/2 + 
e)+nei <l,oxR< 1/2— (e+ei), the probability of misidentification can be made arbitrary small. 

Converse: From (|2~6|) in Appendix, we know that P(|e| > n/2) > P(|e| = n/2) = 5, where 
S is non vanishing with respect to codeword length n. With R > 1/2 and sufficiently large 
n, P(|e| — 1 > /) > 5. In this case, the rank of H % is less than |e| — 1 and the syndrome 
decoder will find at least three solutions of equation ([12]) . The misidentification will happen 
with probability at least 1/3 since. From above, we know that if R > 1/2, the probability will 
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be larger than 5/3 for sufficiently large n and it concludes the proof. 

□ 

Next, our approach is generalized to coalitions with t > 2. The key to the following corollary 
is to treat all alphabets other than ±1 in of © as erasures. 

Corollary 1: The rate 2(t 1 : 1) is MD-achievable for fingerprinting under average attack with a 
coalition of size t. 



Proof: The encoder/decoder are the same as the ones in Theorem Q] except for the choices 
of erasure positions as described previously. Note that y-j ^ ±1 whenever the pirates' fingerprints 
bits are not the same at position i. Similar to [5], we know that with high probability, the i.i.d 
generated codebooks will meet 

l £ l<^{ 1 -^T)+ e 

Then, following in the footsteps of the proof of Theorem Q] we obtain our result. □ 



The advantage of the MD decoder, used to obtain the previous result, is the universality for 
all t. However, for each t, we can obtain higher rates by tailoring our encoder/decoder to this 
specific case. To illustaret the idea, let's consider the t — 3 case. Now, j? y = {±1, ±|} and one 
can achieve better performance by exploiting the information contained in the positions with 

y< = ±1- 

Theorem 3: The rate |, §, §)— H{\, |, |) = 0.3113 is achievable for fingerprinting under 
average attack with t = 3. 

Proof: The encoder is the same as Theorem [Q As for the decoder, we first define X as 
a random variable with P(X = ±1) = 1/2, and the random variable Y — (X + X 2 + X 3 )/3, 
where X 2 ,X 3 has the same distribution as X and (X, X 2 ,X 3 ) are independent. The transition 
matrix of P(Y\X) is 

Typically, we need a maximum likelihood (ML) decoder designed for the transition matrix 
P(Y\X). Note that when t = 2, this decoder reduces to the one specified in Theorem CD However, 
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it is hard to investigate the performance of the ML decoder, and we use the jointly-typical 
decoder defined in [9] as a lower-bound for the achievable rate of this decoder. Given a forged 
fingerprint y, the decoder search the codebook to find the codeword such that this codeword 
and y are jointly-typical with respect to P{X, Y) . Once the decoder finds such a codeword, the 
decoder declares it as the pirate. 

Achievability : Without loss of generality, we can assume that the pirates indices are (1, 2, 3). 
An event occurs when the ith codeword and y are jointly typical, and the event Ef is its 
complement. Then the probability of misidentification P m is upper-bounded by 

p m < pm + p (E c 2 ) + pm + J2 

i^l,2,3 

From [9, Theorem 15.2.1], the first three terms can be made less than any arbitrary small 
e > for sufficiently large n. And the last term is upper-bounded by 

(M - 3)2-™^ y )- 4e \ 

So if R < I(X; Y) — 4e, P m can be made arbitrary small for sufficiently large n. According to 
the transition matrix of P(Y\X), we know that 

I(X-Y) = H(-,-,-,-)-H{-,-,-l 
v ' > l 8 8 8 8 v 4'2'4 ; ' 

which concludes the proof □ 

IV. The Marking Assumption 

Having studied the special case of averaging attack, we now proceed to the case when the 
coalition can employ any strategy as long as the marking assumption is satisfied. The following 
result establishes the achievable rate of random fingerprinting codes with MD decoding 
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Theorem 4: For all rates less than 1 — (0.25) there exists an MD-achievable fingerprinting 
code, when t = 2. 



Proof: We use a random coding argument to prove our result. We construct the following 
ensemble of binary random codes as in Theorem [TJ Binary random vectors (fingerprints) of 
length n are assigned to the M = 2 nR users where each coordinate is chosen independently with 
equal probability of being 0, 1. For a small e, we say the assigned fingerprints x 1; x 2 are close 
if d#(xi,x 2 ) < + e). If the pair (xi,x 2 ) is close we denote it by xi <£> x 2 , otherwise for 
a non-close pair we write: x x <-> x 2 . Given a forged fingerprint y, the average probability of 
misidentification over this ensemble can be upper bounded by: 

-P m (y|xi £ x 2 ) + P(xi ^ x 2 ), 

where P m (y|x! <-> x 2 ) is the misidentification probability when y is produced by a close pair 
(xi,x 2 ) and P(xi x 2 ) is the probability that the pirates did not constitute a close pair. Both 
probability are averaged over the random coding ensemble. By the following argument, we will 
show that these probabilities goes exponentially to zero as n goes to infinity hence the proof. 

In Appendix II-AI we have proved that P(xi x 2 ) goes to zero as n goes to infinity. Now 
we turn to P m (y|xx <-> x 2 ). Since rf^(xx,x 2 ) < + e), the Hamming distance of the forged 
copy y with at least one of the pirates must be less than h(n) := n(~ + |) due to the marking 
assumption. Without loss of generality, we assume this pirate to be x x . Using minimum Hamming 
distance decoding, misidentification occurs if there is another binary vector z of length n in the 
codebook such that <i#(y,z) < G^(y,X!). The total probability of this event in the random 
ensemble is upper-bounded by 



M * 2- n{l - H{0 - 2b)) 



2" 

where the union bound is used. The probability of misidentification in a random code of size 
M = 2 nR is at most 

2-n(l-H(0.25)-R) 

The above probability goes exponentially to zero as n — > oo for all rates R < 1 — H(0.25). 

□ 
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Intuitively, with a high probability, the forged copy will be produced by a pair of close pirates. 
Therefore, the minimum Hamming distance between the pirates xi and the forged copy y is 
approximately n/4 implying that we can treat the "channel" between them as a binary symmetric 
channel (BSC) with crossover probability 1/4 (whose capacity is 1 — /f(0.25) [9]). Next, we 
extend our result to binary linear codes 

Theorem 5: For all rates less than 1 — #(0.25), there exists a linear MD-achievable finger- 
printing code, when t = 2. 

Proof: Consider the ensemble of binary linear codes with binary parity generator matrix 
G where elements of G are chosen equally and independently from {0, 1} similar to Theorem 
[2l The size of matrix G is (n — I) x n, with rate R — (n — l)/n and the codeword length n. It 
should also be noted that in the following all matrix multiplications and additions are done in 
module-2 unless otherwise stated. In order to randomize the codebook, the distributor employs 
the following strategy: Generating the secret key vectors as independent binary random vectors 
of length n, whose coordinates are chosen to be 0, 1 independently with probability 1/2. We 
denote the vector indexed by secret key k as k. The vector k is added in the binary domain 
to the codeword, and the resulting vector is assigned to the corresponding user. Note that this 
operation will not change the detectable positions, where the codewords are the different. With 
forged copy y, the decoder subtracts k and performs MD decoding. As we mentioned earlier, 
the secret key is unknown to the users and is only known to the distributor. 

Similar to the proof of Theorem 0], we can upper-bound the probability of misidentification 

as 

P m (y|x 1 ^x 2 ) + P(x 1 ^x 2 ). (18) 

In Appendix II-BI we have established that over the ensemble of linear random codes described 

above, P(xi <-> x 2 ) also goes to zero as the code length goes to infinity. Now let us consider 

c 

P m (y|xi <-> x 2 ). The codes assigned to the users which are the result of the addition of a secret 
key to a linear code can be written as: 

uG + k (19) 



14 

where u is an information message vector. Notice that the ensemble defined by (TT9b is the same 
as ensemble of coset codes introduced in [11]. In our proof, we need the following lemmas for 
the coset codes ensemble that are proved in [11]. 

Lemma 1: The probability of any binary vector v being a codeword in the ensemble defined 
by (Q3 is equal to 2~ n . 

Lemma 2: Let Vi, v 2 be the codewords corresponding to two different information sequences 
ui, u 2 . Then over the ensemble of codes, v 1; v 2 are statistically independent. 
Similar to the proof of Theorem HI again due to the marking assumption we can assume 
dn(y, x x ) < h{n). Using MD decoding, misidentification occurs if there is another binary vector 
z of length n in the codebook such that d H (y, z) < d H (y, XjJ. The total number of binary vectors 
for which d H (y,z) < d#(y,Xi) can be upper bounded by: J2i=i G) = 2 nH( -°- 25 \ By Lemma [Q 
and Lemma |2] over the ensemble each of such vectors z is independent of xi with probability 
2 n . Therefore, the total probability of this event in the ensemble is upper-bounded by: 

Ms|(2 -n(l-ff(0.25)) j 

where again the union bound is used. The probability of misidentification in a random coset 
code of size M = 2 nR is at most 

2-n(l-if(0.25)--R) 

The above probability goes exponentially to zero as n — > oo for all rates R < 1 — if (0.25). 

□ 

When the coalition size, t is larger than two, the minimum distance decoding will fail due to 
the following argument. Let t = 3 and assume that the forged copy is produced by 

y = x x + x 2 + x 3 , 

where the additions are modulo-2. It is easy to check that this attack satisfies the marking 
assumption. For t > 3 the coalition can consider only three of the pirates, ignore the rest and 
apply this attack. Following the footsteps in the proof of Theorem [3] it is easy to see that the 
MD-achievable rate is zero. Indeed, it can also be shown that the resulting "BSC channel" has 
crossover probability 1/2, and this negative result is obtained [9]. 
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V. Belief Propagation for Fingerprinting 

Implementing the exact minimum distance decoder may require prohibitive complexity (espe- 
cially for large codeword lengths). This motivates our approach of using the BP framework to 
approximate the MD decoder. More specifically, in this section, we present explicit constructing 
of graph-based codes, along with the corresponding BP decoders, which are tailored for the 
fingerprinting application. 

A. Averaging attack 

As remarked earlier, the two-pirate averaging attack will produce a "channel" almost equivalent 
to the classical BEC. This inspires the use of graphical codes based on the Repeat Accumulate 
(RA) framework [12], such as the nonsystematic irregular RA code of [13] and the irregular 
ARA code of [14], which were shown to be capacity achieving for the BEC. In our simulations, 
we use the original regular RA codes of [12] due to their simplicity and good performance for 
low rate scenarios. It is worth noting that all the techniques discussed in the sequel can be applied 
directly to the irregular codes presented in [13], [14]. For the sake of completeness we review 
briefly the encoding procedure for regular RA codes: first, the information bits are repeated a 
constant number of times (by a regular repetition code) and interleaved. The interleaved bits 
are then accumulated to generate the code symbols. Similarly, one can employ the standard BP 
iterative decoding approach [15] to identify the pirates. However, we argue next that significant 
performance improvement can be obtained via a key modification to the iterative decodeiQ. 

It is well known that the standard iterative algorithm will fail if a stopping set exists in the 
erased positions [10]. Unfortunately, a stopping set always will exist in the erased positions 
produced by averaging attack. To see this, it is more convenient to represent the RA code using 
the appropriate bipartite Tanner graph containing a set of variables <V = {v 1 ,v 2 , ■ ■ ■} and a set 
of check nodes. The reader are referred to [12], [13], [14] for more details on the graphical 
representation of RA codes. A stopping set s is, therefore, a subset of i> , such that all neighbors 
of s are connected to s at least twice. The standard BP algorithm can now be stated as the follows. 

[Standard BP]: 

*in the following, the fingerprinting codeword alphabets are {0, 1} after decoder transformation and the addition is module-2. 
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1) Find a check node that satisfies the following 

• This check node is not labelled as "finished". 

• The values of all but one of the variable nodes connected to the check node are known. 
Set the value of the unknown erased one to be the module- 2 addition of the other variable 
nodes. And label that check node as "finished". 

2) Repeat step 1 until all check nodes are labeled as "finished" or the decoding cannot 
continue further. If the latter happens, declare the decoding fail. 

It is now easy to see that, in the stopping set, every check node is connected to at least two 
erased variable nodes and the decoder will halt at this point. The following result establishes 
the limitation of the standard BP decoder in our fingerprinting scenario 

Proposition 1: Let Vbi and n/ B2 be the set of values of the variable node set if corresponding 
to pirate fingerprints xi and x 2 , respectively. And let Vd be the set of variable nodes where the 
corresponding values in n/ B1 and V B2 are different. Then V d is a stopping set. 

Proof: This statement is proved by contradiction. First we assume that V d is not a stopping 
set. It means that 3j E Uie^ where the check node j has only one neighbor %' in <V d . Here 
we denote the neighbor of node i in the graph as N(i). For the neighboring variable nodes of 
this check node, we have 

Vmii) = Vmii) Vz ^ i', i e N(j) 
^Bi(i') = + 1 i'eN(j). 

However, from the check equation of this check node 

*Bi(i)= ^2« = 0, (21) 

where the addition is module-2. It is obvious that d20l) contradicts with (T2T]) since the total 
number of variable nodes such that n/ B i(i) ^ V B2 (i),i G N(j) should be even. Thus is a 
stopping set. 

□ 

Since, under averaging attack, the bits of the forged fingerprint will be erased whenever the pirate 
fingerprints are different in the Tanner graph, then <V d will be always contained in the erased 




Fig. 1. Proper variable node to be chosen in step 2 of proposed modified BP algorithm for two-pirate averaging attack. 

positions and the iterative decoder will fail. The modification, presented next, will break the 
stopping set V d , and hence, allow the iterative decoder to proceed forward. The key observation 
is that for every erased position in n/ d , the pirate fingerprints can only be represented by only 
two combinations {0, 1} or {1, 0}. It allows us to choose one variable node in this stopping set, 
and set its value to 1. The modified forged fingerprint will then be "closer" to one of the pirate 
fingerprints. In summary, the decoder becomes 

[Modified BP for fingerprinting]: 

1) Perform the standard BP algorithm, remove all the "finished" labels and Go to step 2 

2) Choose a proper variable node in n/ d (different from previous choices), and set its value to 
1. If the decoder has executed this step more than N max times, declare a decoding failure 
and exit. 

3) Run the standard BP on the new graph. If the decoder fails, reset the variable nodes to 
their original values and Go to step 2. 

In step 2, we must make sure that the chosen variable node breaks the stopping set n/ d . The 
neighboring variables nodes of a degree-3 check node in RA code are good choices. From the 
check equations in (l2Tj) . the erased variables nodes will appear in pair. If we set the value of 
one of the two erased neighbor variable node t>, as 1, this degree-3 check node is connected 
to <V d \ Vi with only one edge. Then V d \ Vi is not a stopping set. We also need to choose the 
variable node which will affect as much other variable nodes in <V d \ vi as possible by setting its 
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Fig. 2. Probability of misidentification under two-pirate averaging attack using RA codes with different rates and modified BP 
algorithm without variable node selection. 

value. Since all check nodes of RA code are degree-3, we choose such variable node v,- t in the 
degree-2 variable-node-chain of RA code, as shown in Fig. Q] The check node is depicted as ffl, 
the unerased variable nodes as black circle and the erased ones as hollow circle. Furthermore, 
each variable node which will benefit from guessing Vi is shown as hollow circle with the letter 
"A" in the figure. The key observation is that, for node v^, the two neighboring accumulator 
output nodes, i.e., and v i+ i, correspond to non-erased bits. This implies that that setting the 
value of Vi will at least affect 6 other variable nodes of rate 1/3 RA code. 

Now, we are ready to report our simulation results. First, we show the performance of proposed 
algorithm with different rate RA codes without variable node selection in Fig |2] (i.e., we select 
the first unerased variable node in the RA degree-2 variable-node-chain and set N max = 1). 
Here, the number of information bits n/R = 16384 is fixed for all rates, to make the number of 
users M the same. We observe that, without selecting the variable node as shown in Fig [Q the 
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Fig. 3. Probability of misidentification under two-pirate averaging attack using rate 1/3 RA code and modified BP algorithm 
with different N max . 



probability of misidentification P^ is high for rate 1/3. This performance can be improved by 
the proposed algorithm for variable node selection and increasing N max as depicted in Fig. [3] 
Finally, in Fig. |4] we report with different code length n and N max = 2. 

Finally, we note that our algorithm is similar, in spirit, to the proposed guessing algorithm in 
[7]. The critical difference is that the structure of our problem ensures that the guessed bit always 
corresponds to one of the pirates, and hence, we do not need to worry about the possibility of 
contradictions as the iteration proceeds. 

B. The Marking Assumption: The Memoryless Attack 

In this subsection, we report our simulation results for the two-pirate memoryless attack. In 
this attack, when the pirates encounter a detectable position, they choose 0, 1 independently and 
with equal probability to form the forged copy. We use rates 1/8, 1/9 and 1/10 ARA codes 
based on the low rate protographs presented in [8]. The protographs of the codes are depicted 
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Fig. 4. Probability of misidentification under two-pirate averaging attack using rate 1/3 RA code and modified BP algorithm 
with different code lengths n. 

in Fig [51 For a formal description of the ARA codes, we refer the interested readers to [8], [14] 
and references therein. Decoding is done iteratively using the BP framework with a maximum 
number of iterations equal to 60. Here, the decoder treats the forged fingerprint as the output of 
a BSC with crossover probability equal to 0.25. In Fig [6l the probability of misidentification P m 
is depicted versus different code lengths for different rates. As shown in the figure, it is clear a 
vanishing small misidentification probability is achievable for rate 1/9 which is about an order of 
magnitude higher than the best result available in the literature for explicit fingerprinting codes. 

VI. Conclusion 

This paper developed an information theoretic framework for the design of low complexity 
coding/decoding techniques for fingerprinting. More specifically, we established the superior 
performance of the minimum distance decoder and validated our theoretical claims via explicit 
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Fig. 5. Protographs of rate 1/8, 1/9, 1/10 ARA codes. 

construction of BP encoding/decoding schemes. In the averaging attack scenario, our framework 
was inspired by the equivalence between our problem and the BEC. We also showed that the 
worst case attack, under the marking assumption, is equivalent to a BSC with a cross-over 
probability equal to 1/4. Our approach for the averaging attack can handle arbitrary coalition 
sizes, whereas it was shown that the MD decoder recover from marking assumption attacks only 
with coalitions composed of two pirates. This negative result motives our current investigations on 
more sophisticated approaches for pirate tracing using the intimate connection between collusion 
in digital fingerprinting and multiple access channels. 

Appendix I 
On non-close pairs in random ensemble 

We will examine the probability of non-close pairs for random i.i.d and linear codebook 
ensembles, and show that these events will not happen with high probability. 
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Performance of ARA codes with different rates 
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Fig. 6. Probability of misidentification for ARA codes with different rates and code lengths, under two-pirate memoryless 
attack. 



A. i.i.d codebook ensemble 

For a codebook C in the i.i.d ensemble and 1 < d < n, define the number of unordered pairs 
of codewords (xj,Xj) with i ^ j in C at distance d apart as 

M i-1 

S dd) -=J2J2 $ {^( x - x ^) = 4, (22) 

i=l j=l 

where $(■) is the indicator function. In [16], it is established that with probability going to one 

as n — ► oo 

2 n(2fl+H(i)-i) n s GV {2R) <d<n(l- 5 GV (2R)) 
otherwise, 



S c (d) = 



(23) 



where 5cv(~) i s m e Gilbert- Varshamov distance which for < R < 1, 5qv(R) is defined as the 
root 5 < 0.5 of the equation H(5) = 1 — R. And 8qv(R) is zero for R > 1. Using (T23T) . we can 
write the probability of non-close pairs in the codes of the random ensemble as 



22nR 



< 



22nR 



(24) 
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which goes exponentially to zero as n — > oo. 

B. Random binary linear codebook ensemble 

For a code C in the linear ensemble and 1 < d < n by the symmetry of linear codes we can 
write 

M i-1 M 

S dd) = E E $ {^( x - x i) = 4 = 2 E E $ {^( x - *j) = 4 = = 2 nR N c (d), 

i=l j=l i=l j^i 

(25) 

where N c (d) := ^{dn(^-i, x,-) = d}. In [16], it is shown that with probability going to one 

as n — > oo 

iV c (d) = { 1 ; V V " (26) 

0, otherwise. 

Therefore, the average probability of a pair being non-close can be written as 

22ni? < 2 2ni? ' 

which again goes exponentially to zero as n — > oo. 

Appendix II 
Computation of M 6 (Z, |e| - 1) 

We will show that for I > \ <e \ 

M b (l, |e|, |e| - 1) = M 6 (|e| - 1,Z, |e| - l)(2l £ l - 1). (28) 

To this end, by symmetry, 

M b (l, |e|,|e| - 1) = M 6 (|e|,Z, |e|-1). 

And from Appendix A of [10] and |e| < I, the RHS equals to 

M 6 (|eU |e| - 1) = M b (|E| - 1,Z, |e| - l)2l £ l- 1 

+ M 6 (|e| -1,Z,|e|-2)(2 z -2I e I- 2 ). 

From Appendix A of [10], we also have the following recursive formula for j — 1 . . . |e | — 2 

M 6 (|e | - j, I, |e I - 1 - j) = M 6 (|e | - 1 - j, I, |e I - 1 - j)2^-^ 

+ M 6 (|£ I — 1 — j, I, \-e\-2- j)(2 l - 2^~ 2 -3). 
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And Mft(|E |, /, |e I — 1) equals to 

J=l I p=l J 

|e|-2 

+M b (l,Z,0)*(2 i -l) J](2 i -2l E l- 1 ^), 

where M 6 (1,Z,0) = 1. 

Finally, using CE5]) in (l29l) . 

| E | 

M 6 (|e|,Z, |e| - 1) = ^M 6 (|e| - 1,Z, |e| - 1)21*1-', 
i=i 

And it is easy to check that the above formula equals to (1281) . 
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